Language identification from short segments of speech

نویسندگان

  • Jyotsana Balleda
  • Hema A. Murthy
  • T. Nagarajan
چکیده

Automatic language identi cation (LID) from the spoken speech utterance is a challenging problem. In this paper, we present an LID system that works for South Indian languages and Hindi. Each language is modeled using an approach based on Vector Quantisation [1]. The speech is segmented into di erent sounds (CVs) and the performance of the system on each of the segments is studied. Our studies indicate that the presence of some CVs is crucial for each language. We also nd that for the same Consonant and Vowel (CV) combination, the quality of the sound is di erent in di erent languages. We show that once the speech signal is segmented into CVs, it is possible to perform LID on very short segments (100-150ms) of speech itself.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accent Identification by Combining Deep Neural Networks and Recurrent Neural Networks Trained on Long and Short Term Features

Automatic identification of foreign accents is valuable for many speech systems, such as speech recognition, speaker identification, voice conversion, etc. The INTERSPEECH 2016 Native Language Sub-Challenge is to identify the native languages of non-native English speakers from eleven countries. Since differences in accent are due to both prosodic and articulation characteristics, a combination...

متن کامل

Features for speaker and language identification

Abstract In this paper we examine several features derived from the speech signal for the purpose of identification of speaker or language from the speech signal. Most of the current systems for speaker and language identification use spectral features from short segments of speech. There are additional features which can be derived from the residual of the speech signal, which correspond to th...

متن کامل

مقایسه روش های طیفی برای شناسایی زبان گفتاری

Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The ...

متن کامل

Spoken Language Identification using Frame Based Entropy Measures

This paper presents a real-time method for Spoken Language Identification based on the entropy of the posterior probabilities of language specific phoneme recognisers. Entropy based discriminant functions computed on short speech segments are used to compare the model fit to a specific set of observations and language identification is performed as a model selection task. The experiments, perfo...

متن کامل

Incorporating linguistic knowledge into automatic dialect identification of Spanish

Automatic dialect identification, like automatic language identification , has often been approached through the use of phonetic frequencies and phonetic sequence modeling. While such statistical systems perform well on language identification problems, they are less adept at the more difficult problem of automatic dialect identification, particularly on short segments of speech. In this paper ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000